{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7H3yTncQfoym"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "h1CiDh7CfqON"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "pp-UaomMQJNo"
      },
      "source": [
        "# Neural Machine Translation with Luong Attention - Tensorflow 2.0\n",
        "\n",
        "_Notebook orignially contributed by: [chunml](https://github.com/chunml)_\n",
        "\n",
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github.com/tensorflow/examples/blob/master/community/en/nn_from_scratch.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/examples/tree/master/community/en/nn_from_scratch.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "NBhbug0Qgmuu"
      },
      "source": [
        "## Overview\n",
        "In this notebook, we will create a neural machine translation model using Sequence-To-Sequence learning and Luong attention mechanism. We will walkthrough the following steps:\n",
        "\n",
        "- Create a vanilla Seq2Seq model that can overfit a tiny dataset of 20 English-French sentence pairs\n",
        "- Train the vanilla Seq2Seq model on the full dataset\n",
        "- Introduce Luong attention mechanism, compare results and visualize attention heatmaps\n",
        "\n",
        "For a more detailed explanation on Luong attention mechanism and implementation, refer to my blog post at: [Neural Machine Translation With Attention Mechanism](https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism/)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "zK2SZnszhSYk"
      },
      "source": [
        "## Setup\n",
        "First, we need to install Tensorflow 2.0."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "S1sepk9uMddm"
      },
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
        "assert tf.__version__.startswith('2')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CH3QqWQ1RiP2"
      },
      "source": [
        "## Import and prepare data\n",
        "\n",
        "Next, let's import the necessary packages and define the tiny dataset for overfitting. The tiny dataset contains only 20 pairs of English - French sentences, which were extracted from the original dataset we will use later on."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "m3za66lwMjWX"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import unicodedata\n",
        "import re\n",
        "\n",
        "raw_data = (\n",
        "    ('What a ridiculous concept!', 'Quel concept ridicule !'),\n",
        "    ('Your idea is not entirely crazy.', \"Votre idée n'est pas complètement folle.\"),\n",
        "    (\"A man's worth lies in what he is.\", \"La valeur d'un homme réside dans ce qu'il est.\"),\n",
        "    ('What he did is very wrong.', \"Ce qu'il a fait est très mal.\"),\n",
        "    (\"All three of you need to do that.\", \"Vous avez besoin de faire cela, tous les trois.\"),\n",
        "    (\"Are you giving me another chance?\", \"Me donnez-vous une autre chance ?\"),\n",
        "    (\"Both Tom and Mary work as models.\", \"Tom et Mary travaillent tous les deux comme mannequins.\"),\n",
        "    (\"Can I have a few minutes, please?\", \"Puis-je avoir quelques minutes, je vous prie ?\"),\n",
        "    (\"Could you close the door, please?\", \"Pourriez-vous fermer la porte, s'il vous plaît ?\"),\n",
        "    (\"Did you plant pumpkins this year?\", \"Cette année, avez-vous planté des citrouilles ?\"),\n",
        "    (\"Do you ever study in the library?\", \"Est-ce que vous étudiez à la bibliothèque des fois ?\"),\n",
        "    (\"Don't be deceived by appearances.\", \"Ne vous laissez pas abuser par les apparences.\"),\n",
        "    (\"Excuse me. Can you speak English?\", \"Je vous prie de m'excuser ! Savez-vous parler anglais ?\"),\n",
        "    (\"Few people know the true meaning.\", \"Peu de gens savent ce que cela veut réellement dire.\"),\n",
        "    (\"Germany produced many scientists.\", \"L'Allemagne a produit beaucoup de scientifiques.\"),\n",
        "    (\"Guess whose birthday it is today.\", \"Devine de qui c'est l'anniversaire, aujourd'hui !\"),\n",
        "    (\"He acted like he owned the place.\", \"Il s'est comporté comme s'il possédait l'endroit.\"),\n",
        "    (\"Honesty will pay in the long run.\", \"L'honnêteté paye à la longue.\"),\n",
        "    (\"How do we know this isn't a trap?\", \"Comment savez-vous qu'il ne s'agit pas d'un piège ?\"),\n",
        "    (\"I can't believe you're giving up.\", \"Je n'arrive pas à croire que vous abandonniez.\"),\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "za_UE4kZnKo7"
      },
      "source": [
        "## Preprocess the data\n",
        "\n",
        "The raw data can't be used yet. We need to apply some preprocessing steps such as:\n",
        "- Convert strings from unicode to ascii\n",
        "- Add space before punctuation\n",
        "- Filter out unwanted tokens\n",
        "- Add *start* and *end* tokens to target sentences"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "ejidHNJnie7h"
      },
      "outputs": [],
      "source": [
        "def unicode_to_ascii(s):\n",
        "    return ''.join(\n",
        "        c for c in unicodedata.normalize('NFD', s)\n",
        "        if unicodedata.category(c) != 'Mn')\n",
        "\n",
        "\n",
        "def normalize_string(s):\n",
        "    s = unicode_to_ascii(s)\n",
        "    s = re.sub(r'([!.?])', r' \\1', s)\n",
        "    s = re.sub(r'[^a-zA-Z.!?]+', r' ', s)\n",
        "    s = re.sub(r'\\s+', r' ', s)\n",
        "    return s\n",
        "\n",
        "\n",
        "raw_data_en, raw_data_fr = list(zip(*raw_data))\n",
        "raw_data_en, raw_data_fr = list(raw_data_en), list(raw_data_fr)\n",
        "raw_data_en = [normalize_string(data) for data in raw_data_en]\n",
        "raw_data_fr_in = ['\u003cstart\u003e ' + normalize_string(data) for data in raw_data_fr]\n",
        "raw_data_fr_out = [normalize_string(data) + ' \u003cend\u003e' for data in raw_data_fr]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "X9NBnWhSnvn4"
      },
      "source": [
        "## Tokenize the raw data\n",
        "\n",
        "Since neural networks only accept numeric arrays as input, we need to tokenize our data, i.e. convert the sentences to integer sequences. This task can be done easily with **Keras** preprocessing utility class."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Y84daISunw_X"
      },
      "outputs": [],
      "source": [
        "en_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')\n",
        "en_tokenizer.fit_on_texts(raw_data_en)\n",
        "data_en = en_tokenizer.texts_to_sequences(raw_data_en)\n",
        "data_en = tf.keras.preprocessing.sequence.pad_sequences(data_en,\n",
        "                                                        padding='post')\n",
        "\n",
        "fr_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')\n",
        "fr_tokenizer.fit_on_texts(raw_data_fr_in)\n",
        "fr_tokenizer.fit_on_texts(raw_data_fr_out)\n",
        "data_fr_in = fr_tokenizer.texts_to_sequences(raw_data_fr_in)\n",
        "data_fr_in = tf.keras.preprocessing.sequence.pad_sequences(data_fr_in,\n",
        "                                                           padding='post')\n",
        "\n",
        "data_fr_out = fr_tokenizer.texts_to_sequences(raw_data_fr_out)\n",
        "data_fr_out = tf.keras.preprocessing.sequence.pad_sequences(data_fr_out,\n",
        "                                                            padding='post')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "KVjT6C2SqmH0"
      },
      "source": [
        "## Create tf.data.Dataset\n",
        "\n",
        "Next, we need to create an instance of tf.data.Dataset, which will help us create batches and iterate over the entire dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "KNSJqjY7qnCe"
      },
      "outputs": [],
      "source": [
        "BATCH_SIZE = 5\n",
        "dataset = tf.data.Dataset.from_tensor_slices(\n",
        "    (data_en, data_fr_in, data_fr_out))\n",
        "dataset = dataset.shuffle(20).batch(BATCH_SIZE)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8c4bNlGLrd5T"
      },
      "source": [
        "## Create the Encoder\n",
        "\n",
        "Now that we have done with the data preparation step, let's move on to creating the model. The vanilla Seq2Seq model consists of an Encoder and a Decoder.\n",
        "\n",
        "The Encoder only has an embedding layer and an RNN layer (can be either vanilla RNN, LSTM or GRU). We also need a method to initialize the state (zero-state)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "K8Sh1UaQrc5D"
      },
      "outputs": [],
      "source": [
        "class Encoder(tf.keras.Model):\n",
        "    def __init__(self, vocab_size, embedding_size, lstm_size):\n",
        "        super(Encoder, self).__init__()\n",
        "        self.lstm_size = lstm_size\n",
        "        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)\n",
        "        self.lstm = tf.keras.layers.LSTM(\n",
        "            lstm_size, return_sequences=True, return_state=True)\n",
        "\n",
        "    def call(self, sequence, states):\n",
        "        embed = self.embedding(sequence)\n",
        "        output, state_h, state_c = self.lstm(embed, initial_state=states)\n",
        "\n",
        "        return output, state_h, state_c\n",
        "\n",
        "    def init_states(self, batch_size):\n",
        "        return (tf.zeros([batch_size, self.lstm_size]),\n",
        "                tf.zeros([batch_size, self.lstm_size]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "8eCxbz9ZSwjr"
      },
      "source": [
        "## Create the Decoder\n",
        "\n",
        "Next, we will create the Decoder. Basically, the Decoder is similar to the Encoder, with an additional Dense layer to map the RNN output to vocabulary space."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "eAN1ONp6MvI7"
      },
      "outputs": [],
      "source": [
        "class Decoder(tf.keras.Model):\n",
        "    def __init__(self, vocab_size, embedding_size, lstm_size):\n",
        "        super(Decoder, self).__init__()\n",
        "        self.lstm_size = lstm_size\n",
        "        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)\n",
        "        self.lstm = tf.keras.layers.LSTM(\n",
        "            lstm_size, return_sequences=True, return_state=True)\n",
        "        self.dense = tf.keras.layers.Dense(vocab_size)\n",
        "\n",
        "    def call(self, sequence, state):\n",
        "        embed = self.embedding(sequence)\n",
        "        lstm_out, state_h, state_c = self.lstm(embed, state)\n",
        "        logits = self.dense(lstm_out)\n",
        "\n",
        "        return logits, state_h, state_c"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "268qdJGGTULP"
      },
      "source": [
        "## Build the model\n",
        "\n",
        "We will not use Keras' *compile* and *fit* methods this time, so we have to manually build the model. The easiest way to achieve that is to feed in some dummy data. We can also print out the outputs' shapes for debugging purpose."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "t8nw7mdKMxvb"
      },
      "outputs": [],
      "source": [
        "en_vocab_size = len(en_tokenizer.word_index) + 1\n",
        "fr_vocab_size = len(fr_tokenizer.word_index) + 1\n",
        "\n",
        "EMBEDDING_SIZE = 32\n",
        "LSTM_SIZE = 64\n",
        "\n",
        "encoder = Encoder(en_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)\n",
        "decoder = Decoder(fr_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)\n",
        "\n",
        "initial_states = encoder.init_states(1)\n",
        "encoder_outputs = encoder(tf.constant([[1, 2, 3]]), initial_states)\n",
        "decoder_outputs = decoder(tf.constant([[1, 2, 3]]), encoder_outputs[1:])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "VioflTOhTnwl"
      },
      "source": [
        "## Create the loss function and the optimizer\n",
        "\n",
        "Next, let's create the loss function. Since we apply zero padding to the source and target sequences, we need to mask those zeros out when computing the loss.\n",
        "\n",
        "As for the optimizer, we will use Adam with default setting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "azgmhikhM0EE"
      },
      "outputs": [],
      "source": [
        "def loss_func(targets, logits):\n",
        "    crossentropy = tf.keras.losses.SparseCategoricalCrossentropy(\n",
        "        from_logits=True)\n",
        "    mask = tf.math.logical_not(tf.math.equal(targets, 0))\n",
        "    mask = tf.cast(mask, dtype=tf.int64)\n",
        "    loss = crossentropy(targets, logits, sample_weight=mask)\n",
        "\n",
        "    return loss\n",
        "\n",
        "optimizer = tf.keras.optimizers.Adam()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "JccKWvNYTtUW"
      },
      "source": [
        "## The train_step function\n",
        "\n",
        "Technically, that's what we call a function which run a full iteration, i.e. a forward pass followed by a backward pass. Since Tensorflow 2.0, we can use the *@tf.function* decorator to explicitly put a particular piece of code in static graph execution. If you want to debug, don't forget to remove it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "-JwpigAlM2dF"
      },
      "outputs": [],
      "source": [
        "@tf.function\n",
        "def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):\n",
        "    with tf.GradientTape() as tape:\n",
        "        en_outputs = encoder(source_seq, en_initial_states)\n",
        "        en_states = en_outputs[1:]\n",
        "        de_states = en_states\n",
        "\n",
        "        de_outputs = decoder(target_seq_in, de_states)\n",
        "        logits = de_outputs[0]\n",
        "        loss = loss_func(target_seq_out, logits)\n",
        "\n",
        "    variables = encoder.trainable_variables + decoder.trainable_variables\n",
        "    gradients = tape.gradient(loss, variables)\n",
        "    optimizer.apply_gradients(zip(gradients, variables))\n",
        "\n",
        "    return loss"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "3VPPwifdjKvE"
      },
      "source": [
        "## The predict function\n",
        "\n",
        "It's always a good idea to see how well the model can do during training, since only monitoring the loss values doesn't tell us that much. Basically, the predict function only does a forward pass. However, on the Decoder's side, we will start with the *start* token. At every next time step, the output of the previous step will be used as the input of the current step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "QROTesdAj5Oc"
      },
      "outputs": [],
      "source": [
        "def predict(test_source_text=None):\n",
        "    if test_source_text is None:\n",
        "        test_source_text = raw_data_en[np.random.choice(len(raw_data_en))]\n",
        "    print(test_source_text)\n",
        "    test_source_seq = en_tokenizer.texts_to_sequences([test_source_text])\n",
        "    print(test_source_seq)\n",
        "\n",
        "    en_initial_states = encoder.init_states(1)\n",
        "    en_outputs = encoder(tf.constant(test_source_seq), en_initial_states)\n",
        "\n",
        "    de_input = tf.constant([[fr_tokenizer.word_index['\u003cstart\u003e']]])\n",
        "    de_state_h, de_state_c = en_outputs[1:]\n",
        "    out_words = []\n",
        "\n",
        "    while True:\n",
        "        de_output, de_state_h, de_state_c = decoder(\n",
        "            de_input, (de_state_h, de_state_c))\n",
        "        de_input = tf.argmax(de_output, -1)\n",
        "        out_words.append(fr_tokenizer.index_word[de_input.numpy()[0][0]])\n",
        "\n",
        "        if out_words[-1] == '\u003cend\u003e' or len(out_words) \u003e= 20:\n",
        "            break\n",
        "\n",
        "    print(' '.join(out_words))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "5LoKa0_vmJMc"
      },
      "source": [
        "## The training loop\n",
        "\n",
        "Here comes the training loop. We will train for 300 epochs and print out the loss value together with the translation result of a random English sentence (within the training data). Doing so will help us know if something went wrong along the way.\n",
        "\n",
        "At first, the model made only non-sense translations, but we can see that it keeps getting better and better over time."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "8SumEGA4nzBE"
      },
      "outputs": [],
      "source": [
        "NUM_EPOCHS = 300\n",
        "for e in range(NUM_EPOCHS):\n",
        "    en_initial_states = encoder.init_states(BATCH_SIZE)\n",
        "    \n",
        "    predict()\n",
        "\n",
        "    for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(dataset.take(-1)):\n",
        "        loss = train_step(source_seq, target_seq_in,\n",
        "                          target_seq_out, en_initial_states)\n",
        "\n",
        "    print('Epoch {} Loss {:.4f}'.format(e + 1, loss.numpy()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HDmZAVdWotL8"
      },
      "source": [
        "## Let's do a full inference\n",
        "Finally, let's have the model translate all 20 training pairs. We can see that at this point, the model has completely learned by heart the training data, which is what we wanted at the beginning."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "txCNxzzUo-7e"
      },
      "outputs": [],
      "source": [
        "test_sents = (\n",
        "    'What a ridiculous concept!',\n",
        "    'Your idea is not entirely crazy.',\n",
        "    \"A man's worth lies in what he is.\",\n",
        "    'What he did is very wrong.',\n",
        "    \"All three of you need to do that.\",\n",
        "    \"Are you giving me another chance?\",\n",
        "    \"Both Tom and Mary work as models.\",\n",
        "    \"Can I have a few minutes, please?\",\n",
        "    \"Could you close the door, please?\",\n",
        "    \"Did you plant pumpkins this year?\",\n",
        "    \"Do you ever study in the library?\",\n",
        "    \"Don't be deceived by appearances.\",\n",
        "    \"Excuse me. Can you speak English?\",\n",
        "    \"Few people know the true meaning.\",\n",
        "    \"Germany produced many scientists.\",\n",
        "    \"Guess whose birthday it is today.\",\n",
        "    \"He acted like he owned the place.\",\n",
        "    \"Honesty will pay in the long run.\",\n",
        "    \"How do we know this isn't a trap?\",\n",
        "    \"I can't believe you're giving up.\",\n",
        ")\n",
        "\n",
        "for test_sent in test_sents:\n",
        "    test_sequence = normalize_string(test_sent)\n",
        "    predict(test_sequence)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "x1M6wnXpsmUI"
      },
      "source": [
        "## Let's train Seq2Seq on the full dataset\n",
        "Now, we have created a fully functional Seq2Seq model. Let's use it to train on the full dataset, which contains approximately 160K English - French pairs. That's huge!\n",
        "\n",
        "But don't worry since we don't have to make any change to the current workflow. We only need to download the full dataset, tweak the networks' hyperparameters, reinitialize everything and we are ready to train.\n",
        "\n",
        "The training is gonna take a (long) while, so be patient!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "M378dMUTtZ-K"
      },
      "outputs": [],
      "source": [
        "import requests\n",
        "import os\n",
        "from zipfile import ZipFile\n",
        "\n",
        "MODE = 'train'\n",
        "URL = 'http://www.manythings.org/anki/fra-eng.zip'\n",
        "FILENAME = 'fra-eng.zip'\n",
        "BATCH_SIZE = 64\n",
        "EMBEDDING_SIZE = 256\n",
        "LSTM_SIZE = 512\n",
        "NUM_EPOCHS = 15\n",
        "\n",
        "\n",
        "# ================= DOWNLOAD AND READ THE DATA ======================\n",
        "def maybe_download_and_read_file(url, filename):\n",
        "    if not os.path.exists(filename):\n",
        "        session = requests.Session()\n",
        "        response = session.get(url, stream=True)\n",
        "\n",
        "        CHUNK_SIZE = 32768\n",
        "        with open(filename, \"wb\") as f:\n",
        "            for chunk in response.iter_content(CHUNK_SIZE):\n",
        "                if chunk:\n",
        "                    f.write(chunk)\n",
        "\n",
        "    zipf = ZipFile(filename)\n",
        "    filename = zipf.namelist()\n",
        "    with zipf.open('fra.txt') as f:\n",
        "        lines = f.read()\n",
        "\n",
        "    return lines\n",
        "\n",
        "\n",
        "lines = maybe_download_and_read_file(URL, FILENAME)\n",
        "lines = lines.decode('utf-8')\n",
        "\n",
        "raw_data = []\n",
        "for line in lines.split('\\n'):\n",
        "    raw_data.append(line.split('\\t'))\n",
        "\n",
        "raw_data = raw_data[:-1]\n",
        "\n",
        "\n",
        "# ================= TOKENIZATION AND ZERO PADDING ===================\n",
        "raw_data_en, raw_data_fr = zip(*raw_data)\n",
        "raw_data_en = [normalize_string(data) for data in raw_data_en]\n",
        "raw_data_fr_in = ['\u003cstart\u003e ' + normalize_string(data) for data in raw_data_fr]\n",
        "raw_data_fr_out = [normalize_string(data) + ' \u003cend\u003e' for data in raw_data_fr]\n",
        "\n",
        "en_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')\n",
        "en_tokenizer.fit_on_texts(raw_data_en)\n",
        "data_en = en_tokenizer.texts_to_sequences(raw_data_en)\n",
        "data_en = tf.keras.preprocessing.sequence.pad_sequences(data_en,\n",
        "                                                        padding='post')\n",
        "\n",
        "fr_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')\n",
        "fr_tokenizer.fit_on_texts(raw_data_fr_in)\n",
        "fr_tokenizer.fit_on_texts(raw_data_fr_out)\n",
        "data_fr_in = fr_tokenizer.texts_to_sequences(raw_data_fr_in)\n",
        "data_fr_in = tf.keras.preprocessing.sequence.pad_sequences(data_fr_in,\n",
        "                                                           padding='post')\n",
        "\n",
        "data_fr_out = fr_tokenizer.texts_to_sequences(raw_data_fr_out)\n",
        "data_fr_out = tf.keras.preprocessing.sequence.pad_sequences(data_fr_out,\n",
        "                                                            padding='post')\n",
        "\n",
        "dataset = tf.data.Dataset.from_tensor_slices(\n",
        "    (data_en, data_fr_in, data_fr_out))\n",
        "dataset = dataset.shuffle(len(raw_data_en)).batch(\n",
        "    BATCH_SIZE, drop_remainder=True)\n",
        "\n",
        "# ======================== BUILD THE MODEL ==========================\n",
        "en_vocab_size = len(en_tokenizer.word_index) + 1\n",
        "encoder = Encoder(en_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)\n",
        "initial_state = encoder.init_states(1)\n",
        "test_encoder_output = encoder(tf.constant(\n",
        "    [[1, 23, 4, 5, 0, 0]]), initial_state)\n",
        "\n",
        "fr_vocab_size = len(fr_tokenizer.word_index) + 1\n",
        "decoder = Decoder(fr_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)\n",
        "de_initial_state = test_encoder_output[1:]\n",
        "test_decoder_output = decoder(tf.constant(\n",
        "    [[1, 3, 5, 7, 9, 0, 0, 0]]), de_initial_state)\n",
        "\n",
        "# ================ ADD GRADIENT CLIPPING OPTION =====================\n",
        "optimizer = tf.keras.optimizers.Adam(clipnorm=5.0)\n",
        "\n",
        "# ================== THIS NEEDS TO BE RE-RUN ========================\n",
        "@tf.function\n",
        "def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):\n",
        "    with tf.GradientTape() as tape:\n",
        "        en_outputs = encoder(source_seq, en_initial_states)\n",
        "        en_states = en_outputs[1:]\n",
        "        de_states = en_states\n",
        "\n",
        "        de_outputs = decoder(target_seq_in, de_states)\n",
        "        logits = de_outputs[0]\n",
        "        loss = loss_func(target_seq_out, logits)\n",
        "\n",
        "    variables = encoder.trainable_variables + decoder.trainable_variables\n",
        "    gradients = tape.gradient(loss, variables)\n",
        "    optimizer.apply_gradients(zip(gradients, variables))\n",
        "\n",
        "    return loss\n",
        "\n",
        "# ===================== THE TRAINING LOOP ===========================\n",
        "for e in range(NUM_EPOCHS):\n",
        "    en_initial_states = encoder.init_states(BATCH_SIZE)\n",
        "    \n",
        "    predict()\n",
        "\n",
        "    for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(dataset.take(-1)):\n",
        "        loss = train_step(source_seq, target_seq_in,\n",
        "                          target_seq_out, en_initial_states)\n",
        "        if batch % 100 == 0:\n",
        "            print('Epoch {} Batch {} Loss {:.4f}'.format(\n",
        "                e + 1, batch, loss.numpy()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Yqgly68R8R4C"
      },
      "source": [
        "## It's time for test\n",
        "So after a while, our vanilla Seq2Seq has completed its training. Let's see how well it can translate. For sake of simplicity, we will use the same 20 English - French pairs that we defined above."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "8I-y-A1I8rwL"
      },
      "outputs": [],
      "source": [
        "test_sents = (\n",
        "    'What a ridiculous concept!',\n",
        "    'Your idea is not entirely crazy.',\n",
        "    \"A man's worth lies in what he is.\",\n",
        "    'What he did is very wrong.',\n",
        "    \"All three of you need to do that.\",\n",
        "    \"Are you giving me another chance?\",\n",
        "    \"Both Tom and Mary work as models.\",\n",
        "    \"Can I have a few minutes, please?\",\n",
        "    \"Could you close the door, please?\",\n",
        "    \"Did you plant pumpkins this year?\",\n",
        "    \"Do you ever study in the library?\",\n",
        "    \"Don't be deceived by appearances.\",\n",
        "    \"Excuse me. Can you speak English?\",\n",
        "    \"Few people know the true meaning.\",\n",
        "    \"Germany produced many scientists.\",\n",
        "    \"Guess whose birthday it is today.\",\n",
        "    \"He acted like he owned the place.\",\n",
        "    \"Honesty will pay in the long run.\",\n",
        "    \"How do we know this isn't a trap?\",\n",
        "    \"I can't believe you're giving up.\",\n",
        ")\n",
        "\n",
        "for test_sent in test_sents:\n",
        "    test_sequence = normalize_string(test_sent)\n",
        "    predict(test_sequence)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "69D5Cu6V8xir"
      },
      "source": [
        "Hmm, somehow acceptable, I think. But obviously, there are a lot of rooms for improvement. We're gonna look at the vanilla Seq2Seq's problems and see what we can do."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hem0E_NwyDCJ"
      },
      "source": [
        "## Let's talk about Attention Mechanisms\n",
        "As we saw from the result above, while the vanilla Seq2Seq can overfit a small dataset, it struggled with learning from a much larger one, especially long sequences.\n",
        "\n",
        "A solution for that? Well, what if all the time steps within the Decoder gain access to the Encoder's state and have the right to decide which part to focus onto? That's the idea behind attention mechanisms."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "v0WjdVkKzmGT"
      },
      "source": [
        "## What is an attention layer made of?\n",
        "So, we understand the idea of having an attention mechanism. Let's see things from the inside. Technically, we need to compute the following:\n",
        "- The alignment vector\n",
        "- The context vector\n",
        "\n",
        "The alignment vector has the same length as the source sequence. Each of its values is the score (or the probability) of the corresponding word within the source sequence.\n",
        "\n",
        "The context vector, on the other hand, is simply the weighted average of the Encoder's output. It will then be used by the Decoder to compute the final output.\n",
        "\n",
        "If you want a fancier explanation, please refer to my blog post (link above). I have a bunch of images there.\n",
        "\n",
        "Below is how we implement Luong attention in Python."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "jp3mNnzn2fBY"
      },
      "outputs": [],
      "source": [
        "class LuongAttention(tf.keras.Model):\n",
        "    def __init__(self, rnn_size, attention_func):\n",
        "        super(LuongAttention, self).__init__()\n",
        "        self.attention_func = attention_func\n",
        "\n",
        "        if attention_func not in ['dot', 'general', 'concat']:\n",
        "            raise ValueError(\n",
        "                'Unknown attention score function! Must be either dot, general or concat.')\n",
        "\n",
        "        if attention_func == 'general':\n",
        "            # General score function\n",
        "            self.wa = tf.keras.layers.Dense(rnn_size)\n",
        "        elif attention_func == 'concat':\n",
        "            # Concat score function\n",
        "            self.wa = tf.keras.layers.Dense(rnn_size, activation='tanh')\n",
        "            self.va = tf.keras.layers.Dense(1)\n",
        "\n",
        "    def call(self, decoder_output, encoder_output):\n",
        "        if self.attention_func == 'dot':\n",
        "            # Dot score function: decoder_output (dot) encoder_output\n",
        "            # decoder_output has shape: (batch_size, 1, rnn_size)\n",
        "            # encoder_output has shape: (batch_size, max_len, rnn_size)\n",
        "            # =\u003e score has shape: (batch_size, 1, max_len)\n",
        "            score = tf.matmul(decoder_output, encoder_output, transpose_b=True)\n",
        "        elif self.attention_func == 'general':\n",
        "            # General score function: decoder_output (dot) (Wa (dot) encoder_output)\n",
        "            # decoder_output has shape: (batch_size, 1, rnn_size)\n",
        "            # encoder_output has shape: (batch_size, max_len, rnn_size)\n",
        "            # =\u003e score has shape: (batch_size, 1, max_len)\n",
        "            score = tf.matmul(decoder_output, self.wa(\n",
        "                encoder_output), transpose_b=True)\n",
        "        elif self.attention_func == 'concat':\n",
        "            # Concat score function: va (dot) tanh(Wa (dot) concat(decoder_output + encoder_output))\n",
        "            # Decoder output must be broadcasted to encoder output's shape first\n",
        "            decoder_output = tf.tile(\n",
        "                decoder_output, [1, encoder_output.shape[1], 1])\n",
        "\n",
        "            # Concat =\u003e Wa =\u003e va\n",
        "            # (batch_size, max_len, 2 * rnn_size) =\u003e (batch_size, max_len, rnn_size) =\u003e (batch_size, max_len, 1)\n",
        "            score = self.va(\n",
        "                self.wa(tf.concat((decoder_output, encoder_output), axis=-1)))\n",
        "\n",
        "            # Transpose score vector to have the same shape as other two above\n",
        "            # (batch_size, max_len, 1) =\u003e (batch_size, 1, max_len)\n",
        "            score = tf.transpose(score, [0, 2, 1])\n",
        "\n",
        "        # alignment a_t = softmax(score)\n",
        "        alignment = tf.nn.softmax(score, axis=2)\n",
        "\n",
        "        # context vector c_t is the weighted average sum of encoder output\n",
        "        context = tf.matmul(alignment, encoder_output)\n",
        "\n",
        "        return context, alignment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "XFo37SX32xcY"
      },
      "source": [
        "## Update the Decoder to use Luong Attention\n",
        "In order to use the attention above, we need to make a couple of small changes. We will start off with the Decoder."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "NjPtx5613o9p"
      },
      "outputs": [],
      "source": [
        "class Decoder(tf.keras.Model):\n",
        "    def __init__(self, vocab_size, embedding_size, rnn_size, attention_func):\n",
        "        super(Decoder, self).__init__()\n",
        "        self.attention = LuongAttention(rnn_size, attention_func)\n",
        "        self.rnn_size = rnn_size\n",
        "        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)\n",
        "        self.lstm = tf.keras.layers.LSTM(\n",
        "            rnn_size, return_sequences=True, return_state=True)\n",
        "        self.wc = tf.keras.layers.Dense(rnn_size, activation='tanh')\n",
        "        self.ws = tf.keras.layers.Dense(vocab_size)\n",
        "\n",
        "    def call(self, sequence, state, encoder_output):\n",
        "        # Remember that the input to the decoder\n",
        "        # is now a batch of one-word sequences,\n",
        "        # which means that its shape is (batch_size, 1)\n",
        "        embed = self.embedding(sequence)\n",
        "\n",
        "        # Therefore, the lstm_out has shape (batch_size, 1, rnn_size)\n",
        "        lstm_out, state_h, state_c = self.lstm(embed, initial_state=state)\n",
        "\n",
        "        # Use self.attention to compute the context and alignment vectors\n",
        "        # context vector's shape: (batch_size, 1, rnn_size)\n",
        "        # alignment vector's shape: (batch_size, 1, source_length)\n",
        "        context, alignment = self.attention(lstm_out, encoder_output)\n",
        "\n",
        "        # Combine the context vector and the LSTM output\n",
        "        # Before combined, both have shape of (batch_size, 1, rnn_size),\n",
        "        # so let's squeeze the axis 1 first\n",
        "        # After combined, it will have shape of (batch_size, 2 * rnn_size)\n",
        "        lstm_out = tf.concat(\n",
        "            [tf.squeeze(context, 1), tf.squeeze(lstm_out, 1)], 1)\n",
        "\n",
        "        # lstm_out now has shape (batch_size, rnn_size)\n",
        "        lstm_out = self.wc(lstm_out)\n",
        "\n",
        "        # Finally, it is converted back to vocabulary space: (batch_size, vocab_size)\n",
        "        logits = self.ws(lstm_out)\n",
        "\n",
        "        return logits, state_h, state_c, alignment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "OZwGJvV03wv7"
      },
      "source": [
        "## Rebuild the model\n",
        "Next, let's rebuild the model. Notice that we must now pass the Encoder's output to the Decoder."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "_L9e4HgY4Atk"
      },
      "outputs": [],
      "source": [
        "# Set the score function to compute alignment vectors\n",
        "# Can choose between 'dot', 'general' or 'concat'\n",
        "ATTENTION_FUNC = 'concat'\n",
        "\n",
        "encoder = Encoder(en_vocab_size, EMBEDDING_SIZE, LSTM_SIZE)\n",
        "\n",
        "decoder = Decoder(fr_vocab_size, EMBEDDING_SIZE, LSTM_SIZE, ATTENTION_FUNC)\n",
        "\n",
        "# These lines can be used for debugging purpose\n",
        "# Or can be seen as a way to build the models\n",
        "initial_state = encoder.init_states(1)\n",
        "encoder_outputs = encoder(tf.constant([[1]]), initial_state)\n",
        "decoder_outputs = decoder(tf.constant(\n",
        "    [[1]]), encoder_outputs[1:], encoder_outputs[0])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IDuMQpT54LQM"
      },
      "source": [
        "## Modify the train_step function\n",
        "Next, we need to modify the function for training to compute the Decoder's output one step at a time, which means that we need to explicitly create a loop."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "dlf1bZBG40ET"
      },
      "outputs": [],
      "source": [
        "@tf.function\n",
        "def train_step(source_seq, target_seq_in, target_seq_out, en_initial_states):\n",
        "    loss = 0\n",
        "    with tf.GradientTape() as tape:\n",
        "        en_outputs = encoder(source_seq, en_initial_states)\n",
        "        en_states = en_outputs[1:]\n",
        "        de_state_h, de_state_c = en_states\n",
        "\n",
        "        # We need to create a loop to iterate through the target sequences\n",
        "        for i in range(target_seq_out.shape[1]):\n",
        "            # Input to the decoder must have shape of (batch_size, length)\n",
        "            # so we need to expand one dimension\n",
        "            decoder_in = tf.expand_dims(target_seq_in[:, i], 1)\n",
        "            logit, de_state_h, de_state_c, _ = decoder(\n",
        "                decoder_in, (de_state_h, de_state_c), en_outputs[0])\n",
        "\n",
        "            # The loss is now accumulated through the whole batch\n",
        "            loss += loss_func(target_seq_out[:, i], logit)\n",
        "\n",
        "    variables = encoder.trainable_variables + decoder.trainable_variables\n",
        "    gradients = tape.gradient(loss, variables)\n",
        "    optimizer.apply_gradients(zip(gradients, variables))\n",
        "\n",
        "    return loss / target_seq_out.shape[1]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "RbzD0nsU42Wz"
      },
      "source": [
        "## Modify the predict function\n",
        "The change to make in the predict function is simple, we need to collect and return the alignment vectors. They will help generate some fancy attention heatmaps for visualization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "uatef7NK5RbI"
      },
      "outputs": [],
      "source": [
        "def predict(test_source_text=None):\n",
        "    if test_source_text is None:\n",
        "        test_source_text = raw_data_en[np.random.choice(len(raw_data_en))]\n",
        "    print(test_source_text)\n",
        "    test_source_seq = en_tokenizer.texts_to_sequences([test_source_text])\n",
        "    print(test_source_seq)\n",
        "\n",
        "    en_initial_states = encoder.init_states(1)\n",
        "    en_outputs = encoder(tf.constant(test_source_seq), en_initial_states)\n",
        "\n",
        "    de_input = tf.constant([[fr_tokenizer.word_index['\u003cstart\u003e']]])\n",
        "    de_state_h, de_state_c = en_outputs[1:]\n",
        "    out_words = []\n",
        "    alignments = []\n",
        "\n",
        "    while True:\n",
        "        de_output, de_state_h, de_state_c, alignment = decoder(\n",
        "            de_input, (de_state_h, de_state_c), en_outputs[0])\n",
        "        de_input = tf.expand_dims(tf.argmax(de_output, -1), 0)\n",
        "        out_words.append(fr_tokenizer.index_word[de_input.numpy()[0][0]])\n",
        "\n",
        "        alignments.append(alignment.numpy())\n",
        "\n",
        "        if out_words[-1] == '\u003cend\u003e' or len(out_words) \u003e= 20:\n",
        "            break\n",
        "\n",
        "    print(' '.join(out_words))\n",
        "    return np.array(alignments), test_source_text.split(' '), out_words"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "j2FbQhgL5ZTH"
      },
      "source": [
        "## Let's train Seq2Seq with Luong attention\n",
        "The training loop is basically the same, although I recommend putting the call to predict function inside a *try-catch* to prevent crashing when the model's prediction is zero."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "EKK9owPY6I7U"
      },
      "outputs": [],
      "source": [
        "for e in range(NUM_EPOCHS):\n",
        "    en_initial_states = encoder.init_states(BATCH_SIZE)\n",
        "    \n",
        "    for batch, (source_seq, target_seq_in, target_seq_out) in enumerate(dataset.take(-1)):\n",
        "        loss = train_step(source_seq, target_seq_in,\n",
        "                          target_seq_out, en_initial_states)\n",
        "        if batch % 100 == 0:\n",
        "            print('Epoch {} Batch {} Loss {:.4f}'.format(\n",
        "                e + 1, batch, loss.numpy()))\n",
        "\n",
        "    try:\n",
        "        predict()\n",
        "        predict(\"How are you today ?\")\n",
        "    except Exception:\n",
        "        continue"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "FONBVqB-7Yuo"
      },
      "source": [
        "## Let's make some translations\n",
        "Similar to what we did with the vanilla Seq2Seq, it's time to see how the model with Luong attention can translate the same 20 pairs of English - French sentences.\n",
        "\n",
        "Furthermore, we can also visualize where the model focused on when making translation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "YYG2kXMn8M7X"
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import imageio\n",
        "\n",
        "if not os.path.exists('heatmap'):\n",
        "    os.makedirs('heatmap')\n",
        "\n",
        "test_sents = (\n",
        "    'What a ridiculous concept!',\n",
        "    'Your idea is not entirely crazy.',\n",
        "    \"A man's worth lies in what he is.\",\n",
        "    'What he did is very wrong.',\n",
        "    \"All three of you need to do that.\",\n",
        "    \"Are you giving me another chance?\",\n",
        "    \"Both Tom and Mary work as models.\",\n",
        "    \"Can I have a few minutes, please?\",\n",
        "    \"Could you close the door, please?\",\n",
        "    \"Did you plant pumpkins this year?\",\n",
        "    \"Do you ever study in the library?\",\n",
        "    \"Don't be deceived by appearances.\",\n",
        "    \"Excuse me. Can you speak English?\",\n",
        "    \"Few people know the true meaning.\",\n",
        "    \"Germany produced many scientists.\",\n",
        "    \"Guess whose birthday it is today.\",\n",
        "    \"He acted like he owned the place.\",\n",
        "    \"Honesty will pay in the long run.\",\n",
        "    \"How do we know this isn't a trap?\",\n",
        "    \"I can't believe you're giving up.\",\n",
        ")\n",
        "\n",
        "filenames = []\n",
        "\n",
        "for i, test_sent in enumerate(test_sents):\n",
        "    test_sequence = normalize_string(test_sent)\n",
        "    alignments, source, prediction = predict(test_sequence)\n",
        "    attention = np.squeeze(alignments, (1, 2))\n",
        "    fig = plt.figure(figsize=(10, 10))\n",
        "    ax = fig.add_subplot(1, 1, 1)\n",
        "    ax.matshow(attention, cmap='jet')\n",
        "    ax.set_xticklabels([''] + source, rotation=90)\n",
        "    ax.set_yticklabels([''] + prediction)\n",
        "\n",
        "    filenames.append('heatmap/test_{}.png'.format(i))\n",
        "    plt.savefig('heatmap/test_{}.png'.format(i))\n",
        "    plt.close()\n",
        "\n",
        "with imageio.get_writer('translation_heatmaps.gif', mode='I', duration=2) as writer:\n",
        "    for filename in filenames:\n",
        "        image = imageio.imread(filename)\n",
        "        writer.append_data(image)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "kkUfYpHZ9Hjm"
      },
      "source": [
        "As you can see, we can tell by feeling that the model with Luong attention did make better translations than the vanilla model. For a more accurate metric, you can compute the BLEU scores and compare between the two.\n",
        "\n",
        "As for the GIF image, we can download to our local machine and use any program of your choice to open it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "zAUUr7b399su"
      },
      "outputs": [],
      "source": [
        "from google.colab import files\n",
        "\n",
        "files.download('translation_heatmaps.gif')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "6k8OxsPF-G_t"
      },
      "source": [
        "## Final words\n",
        "And that is that. We have finished creating a neural machine translation model with the renown Seq2Seq model and Luong-style attention mechanism.\n",
        "\n",
        "For a more detailed explanation, feel free to jump to my blog post at [Neural Machine Translation With Attention Mechanism](https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism/).\n",
        "\n",
        "If you have any problems, don't hesitate to let me know. Thank you for reading such a long post."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "hUmoed-i_FCV"
      },
      "source": [
        "## Reference\n",
        "- Neural Machine Translation With Attention Mechanism: [link](https://machinetalk.org/2019/03/29/neural-machine-translation-with-attention-mechanism/)\n",
        "- Effective Approaches to Attention-based Neural Machine Translation paper (Luong attention): [link](https://arxiv.org/abs/1508.04025)\n",
        "- Tensorflow Neural Machine Translation with (Bahdanau) Attention tutorial: [link](https://www.tensorflow.org/alpha/tutorials/sequences/nmt_with_attention)\n",
        "- Luong’s Neural Machine Translation repository: [link](https://github.com/tensorflow/nmt)"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "nn_from_scratch.ipynb",
      "private_outputs": true,
      "provenance": [],
      "toc_visible": true,
      "version": "0.3.2"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
